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Identifying light mesons which contain only up/down quarks (pions) from those containing a 
strange quark (kaons) over the typical meter length scales of a particle physics detector requires 
instrumentation capable of measuring flight times with a resolution on the order of 20ps. In the last 
few years a large number of inexpensive, multi-channel Time-to-Digital Converter (TDC) chips have 
become available. These devices typically have timing resolution performance in the hundreds of ps 
regime. A technique is presented that is a monolithic version of "time stretcher" solution adopted for 
the Belle Time-Of-Flight system to address this gap between resolution need and intrinsic multi-hit 
TDC performance. 



I. BACKGROUND 

Particle identification in the Belle experiment is 
based upon a composite system of subdetectors, as 
illustrated in Fig. dJ This hybrid system consists of 
ionization loss measurements (dE/dx) in the Cen- 
tral Drift Chamber (CDC), Cherenkov light emis- 
sion measurement in the barrel and endcap Aerogel 
Chernkov Counters (ACC), and flight time measure- 
ment in the Time Of Flight (TOF) system. As in- 
dicated in the lower section of this figure, the three 
systems work together to cover the momentum range 
of interest. 

Of these recording systems, the TOF system 
makes the most severe demands on time resolution. 
Indeed, given the 2ns spacing between RF buckets 
(and possible collisions), it is not known at recording 
time to which collision a given particle interaction in 
the TOF system corresponds. 
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FIG. 1: Depiction of the composite detector configuration 
used by Belle for particle identification. 



Precision time recording in a very high rate envi- 
ronment requires an encoding scheme capable of con- 
tinuous recording, with a minimum of deadtime per 
logged event. At the time of the construction of the 
Belle experiment [1] at the KEK B-factory [2], a de- 
cision was made to unify the entire detector readout 
(except for the silicon vertex detector) on the LeCroy 
1877 Multi-hit TDC Module. This Fastbus module 
is based upon the MTD132A ASIC 0, which has 
a 0.5ns resolution encoding, comparable to a num- 
ber of similar devices 0, HI, @. Given the limited 
manpower for DAQ system development and main- 
tenance, this proved to be a wise choice. The intrin- 
sic time resolution was quite adequate for recording 
the timing information from the CDC, as well as the 
amplitude information (through use of a charge-to- 
time converter) for the CDC and ACC. 

The challenge then was to be able to record PMT 
hits with 20ps resolution, using a multi-hit TDC 
having 500ps least count, and for collisions poten- 
tially separated by only 2ns. This latter constraint 
meant that traditional techniques using a common 
start or stop could not be applied, since the bunch 
collision of interest was not known at the time at 
which the hits need to be recorded. Moreover, in or- 
der to avoid incurring additional error due to com- 
paring a separate fiducial time, it is desirable to di- 
rectly reference all time measurements to the accel- 
erator RF clock. The solution adopted was a so- 
called Time Stretcher circuit, developed by one of 
the authors in conjunction with the LeCroy Corpo- 
ration [8[. This work built upon valuable lessons 
learned in developing a similar recording system for 
the Particle Identification Detector system of the 
CPLEAR experiment [9[. The principle of operation 
is seen in Fig. (2j Hits are time-dilated with respect 
to the accelerator clock and recorded at coarser reso- 
lution, but in direct proportion to the stretch factor 
employed. Statistically, by also logging the raw hits, 
this stretch factor can be determined from the data. 
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High and low level discriminators are used, with 
the high-level used to reject background photon hits 
in the TOF and a low-level threshold used to provide 
the best possible leading edge timing. The charge 
of triggered events is also recorded with a charge 
to time (Q-to-T) ASIC, which is recorded with the 
same common TDC module. Charge recording is 
needed to correct for amplitude dependent timing 
effects in the discriminator itself. 



FIG. 2: Timing diagram illustrating the operating prin- 
ciple of the Time Stretcher circuit, as explained in the 
text. 



As seen in Fig. [2j four timing edges are recorded 
for each hit signal. The leading edge corresponds to 
the actual output time of the discriminator. This 
rising edge is paired with a falling edge, correspond- 
ing to the 2nd accelerator reference clock (RF clock 
divided by 16) occuring after the initial hit timing. 
The interval of interest is then bounded to be be- 
tween about 16-32 ns. With a TDC least count of 
0.5ns, a factor of twenty time expansion is needed 

- the stretch factor. In the figure the third edge 
corresponds to the time-expanded version of the in- 
terval between the rising and falling edges. A benefit 
of this technique is that it provides self-calibration. 
By recording a large number of events, the stretch 
factor can be extracted from the data itself since 
the raw and expanded signals are recorded. A 4th 
edge is provided, two clock rising edges after the 3rd 
edge, to provide a return to known state before next 
pulse. An obvious drawback in this scheme is that 
the deadtime for each hit will be something like 320 

- 640ns, as will be discussed later. 

In more detail, the signal chain of the current Belle 
TOF electronics [7[, is sketched in Fig. [3j 
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FIG. 3: Time Of Flight Front- End Electronics readout 
flow. Precision timing is performed with a coarse, multi- 
hit TDC (LeCroy 1877) by means of a time-stetcher cir- 
cuit. 



II. SUPER B FACTORY 

The TOF readout system has worked well for al- 
most a decade. Increased luminosity (already 60% 
over design) has lead to much higher single channel 
rates than had been specified in the design. From 
the beginning, the maximum design specification 
was 70kHz of single particle interaction rate for each 
channel. At this rate the expected inefficiency would 
be a few percent, comparable to the geometric inef- 
ficiency (due to cracks between scintillators). 

Already the world's highest luminosity collider, 
the KEKB accelerator [2[ can now produce in ex- 
cess of one million B meson pairs per day. Upgrade 
plans call for increasing this luminosity by a factor 
of 30-50, providing huge data samples of 3rd genera- 
tion quark and lepton decays. Precise interrogation 
of Standard Model predictions will be possible, if 
a clean operating environment can be maintained. 
Extrapolation of current occupancies to this higher 
luminosity mandates an upgrade of the readout elec- 
tronics. The current system already suffers from 
significant loss of efficiency with higher background 
rates, as may be seen in Fig. |4j 
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FIG. 4: Composite TOF inefficiency for the last 3 years 
of running at Belle. Inefficiency grows with higher TOF 
singles rates, which have increased with increased lumi- 
nosity, well beyond the 70kHz design specification. 
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III. PARTICLE IDENTIFICATION 
IMPROVEMENT 

In considering an upgrade to the TOF readout 
electronics, it is worthwhile to consider the needs 
of an upgraded PID system for Belle. A compara- 
tive study of the Belle system, as depicted in Fig.[TJ 
with that of BaBar [10] PID system is informa- 
tive. It is clear in Fig. [5] the Direct Internally Re- 
flected Cherenkov (DIRC) detector of BaBar has a 
higher efficiency and lower fake rate than the hybrid 
TOF/ACC scheme used by Belle. 

Indeed, it was realized in the construction stage 
of Belle that such a DIRC-type detector would have 
merits, and prototypes were explored While 
these results were very promising, the schedule risks 
led the collaboration to stick with technologies in 
which significant time and effort had already been 
invested. 

Thinking about an upgrade, it is reasonable to 
revisit the choice of technology. In the intervening 
decade, significant progress has been made in the de- 
velopment of Ring Imaging CHerenkov (RICH) de- 
tectors [l2|, EH , as well as detectors based upon the 
arrival time of the Cherenkov photons, such at the 
Correlated Cherenkov Timing (CCT) Q and Time 
Of Propagation (TOP) [l5[ counters. 

Because of the great cost encumbered in the 
procurement and construction of the Csl crystal 
calorimeter, it is planned not to upgrade the barrel 
section. As a consequence, the volume available for 
the TOF/ACC replacement detector is rather lim- 
ited. Therefore a RICH type detector has not been 
pursued. The most promising technologies to date 
are those illustrated in Fig. [6j The TOP concept 
uses timing in place of one of the projected spatial 
dimensions to reconstruct the Cherenkov emission 
ring. A focusing DIRC is principally using geometry 
to reconstruct the Cherenkov ring segments. How- 
ever, in this case precision timing is still very useful 
for two important reasons. First it allows for the 
possibility of using timing to correct for chromatic 
dispersion in the emission angle of the Cherenkov 
photon. And second, fine timing allows time of flight 
to be measured using the quartz radiator bar. 

Therefore, in both of the viable detector options 
considered, a large number of fine timing resolu- 
tion recording channels are required. In the case 
of a finely segmented focusing DIRC [16[ option, the 
number of readout channels could be comparable to 
that of the current silicon vertex detector. Clearly if 
such a detector is to be viable, significant integration 
of the readout electronics will be essential. 

Not shown is a proposal for a multi-segmented 
TOF detector consisting of short scintillator bars. 
While this option remains viable (and the electron- 



ics presented would work well with such a system), 
the PID performance degradation of such a system 
is probably unacceptable. Of the choices listed, the 
most attractive in terms of performance is a focus- 
ing DIRC detector, if the issues of the photodetector 
and readout can be addressed. 

Either as an upgrade of only the readout electron- 
ics or as a prototype for a higher channel count PID 
detector, it is worth considering improvements to the 
existing readout. 

IV. THE MONOLITHIC TIME STRETCHER 

The Time Stretcher technique has worked very 
well and Belle has been able to maintain approxi- 
mately lOOps resolution performance with the TOF 
system. A slow degradation with time is consistent 
with loss of light output. Detailed Monte-Carlo sim- 
ulation [ItJ has been able to reproduce much of the 
performance of the TOF system and the degrada- 
tion is consistent with light loss due to crazing of the 
scintillator surface. A larger concern is the signifi- 
cant degradation of TOF system performance due to 
high hit rates. While the multi-hit TDC is capable 
of keeping up with high rates (though the limited 
number of recorded edges (16) also leads to ineffi- 
ciency), by its very nature, the Time Stretcher out- 
put can not be significantly reduced. Recently, the 
clock speed was doubled, to help reduce this effect. 
Nevertheless, at ever higher hit rates, the deadtime 
leads to ever increasing inefficiency. 

A logical solution to this problem is to introduce 
a device which has buffering. Also, while taking the 
effort to reduce the deadtime, it makes sense to con- 
sider a much more compact form-factor. This was 
done with the thought toward moving to a larger 
number of readout channels in a future Belle PID 
upgrade [l8j |. as mentioned earlier. One proposed so- 
lution is the Monolithic Time Stretcher (MTS) chip, 
a prototype of which is shown in Fig. 

The fundamental logic of the device is identical to 
that currently in use with two major changes: 

1. High density 

2. Multi-hit 

High density is achieved by replacing discrete 
Emitter-Coupled Logic components on daughter 
cards with a full custom integrated circuit. This 
higher integration permits having multiple time 
stretcher channels for each input. By toggling to 
a secondary output channel, the deadtime can be 
significantly reduced. Once a hit is processed in one 
output channel, the next is armed to process a sub- 
sequent hit. 
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FIG. 5: A direct comparison of PID technologies for the B-factory detectors. On the left, the performance of the Belle 
hybrid TOF/ACC system; on the right, a similar plot for the BaBar DIRC system. In both cases at lower momentum 
the K/tv separation is enhanced through the use of drift- chamber dE/dx information. As may be seen, the overall fake 
rate is lower and efficiency higher for the DIRC. 




*200 counters = 1440 channels 



Multi-hit (hidden cost) >1440 channels 




Trajectory 



FIG. 6: Concept figures of 3 of the Cherenkov ring imaging detectors that have been considered for the Belle detector 
upgrade. While simplest, the "Bar TOP" (Time- Of -Propagation) detector has been ruled out due to inadequate 
performance. Of the remaining two, the number of instrumented readout channels will depend upon the photodetector 
chosen, though will likely require many tens of thousands of readout channels, dictating a monolithic approach. 



In Fig. [3 the 8 channel repeating structure of each 
time stretcher circuit is clearly seen in the die pho- 
tograph. The basics of the time-stretcher circuit are 
visible in Fig. [51 A one-shot circuit at the upper left 
leads to an immediate output signal, as well as starts 
charging current /hi- Pipelining of the hit signal con- 



tinues for two clock cycles after which current /hi is 
switched off and discharge current I\ Q is switched on. 
A comparator monitors the voltage induced on the 
storage capacitor due to charging and discharging, 
providing an output signal to indicate the stretched 
time when the voltage is discharged. 
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FIG. 8: Schematic of the basic clocked time stretcher circuit. 
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The stretch factor is given by the ratio of the two 
currents: SF = /hi Mo- Each input channel of the 
MTS1 has two time stretcher circuits, the second 
corresponding to the secondary output when the pri- 
mary channel is active. Each output is recorded by 
a separate TDC channel. With this configuration at 
10% deadtime for a single channel of time stretcher 
can be reduced to 1%. As the the incremental cost of 
additional TDC channels is rather low, it is possible 
consider additional buffering depths, which would 
reduce the deadtime by the dT N , where N is the 
buffer depth, though that was not explored beyond 
a depth of two in this device. 

Reduction of cross-talk and Electro-Magnetic In- 
terference is enhanced by the use of Low Voltage 
Differential Signalling (LVDS) [20]. MTS1 is fabri- 
cated in the Taiwan Semiconductor Manufacturing 
Corporation 0.35/mi CMOS process. 



A. Form Factor Reduction 



B. Test Results 

In order to test the performance of the MTS1, 
a multi-hit TDC should be used. As a demonstra- 
tion of the power of this time stretching technique, 
an Field Programmable Gate Array (FPGA) can 
be used as this TDC [2l|, where the results from 
a simple Gray-code counter implementation of the 
hit time recording may be seen in Fig. [10l The 
RMS of the distribution is about 840ps for the Xilinx 
Spartan-3 device used. This resolution could be im- 
proved by use of a faster FPGA, though is sufficient 
to obtain the test results shown below. 

Indeed, it is worth noting that this combined 
Time-Stetcher + FPGA technique is very powerful 
for two important reasons: 

1. low-cost, high-density TDC implementation 

2. deep and flexible hit buffering and trigger 
matching logic 



When considering a photodetector with a large 
number of channels, the form factor of this device is 
very attractive, as shown for comparison in Fig. [9l a 
substantial reduction in size has been achieved. On 
the left is a 16-channel Fastbus-sized Time Stretcher 
card used currently in the Belle experiment. Inset is 
a test board with one of the MTS1 packaged devices 
for comparison, where a dime has been placed on the 
board for scale. 




TTL-LVDS translator 



FIG. 9: A form-factor comparison between the current 
Fastbus-sized, 16-channel Time Stretcher and the MTS1 
chip on a test board. The test board occupies almost the 
same space as a single daughtercard channel on the TS 
motherboard, and has the same number of channels of 
time- stretching as the whole module. 



With this level of integration it becomes feasible to 
consider integration of the time stretcher and TDC 
electronics on detector, as is being done for detector 
subsystems in the LHC experiments. 
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FIG. 10: Timing resolution obtained for the FPGA-based 
TDC used in the MTS1 evaluatiton. 



A test sweep of the MTS1 input is shown in 
Fig. HH where it should be noted that due to the en- 
coding scheme it is only meaningful to scan within a 
time expansion clock cycle period. A scan of expan- 
sion ratios was performed and the best results were 
obtained for stretch factors of 40-50. 

As can be seen, there is some non-linearity in the 
expanded time. This is more clearly seen when a plot 
of the residual distribution is made by subtracting off 
the linear fit, as shown in Fig.[12j A periodic struc- 
ture is seen, roughly consistent with the expansion 
clock period, if the negative timing dips are corre- 
lated to transition edges. 



B Test Results 
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FIG. 11: Scan of stretched times versus input reference 
time, within a single stretch clock cycle. In this case a 
stretch factor of about was used. 
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-0.8^ I 

5 10 15 20 25 

first pulse width (ns) 



FIG. 12: Residual distribution after a linear fit and appli- 
cation of the time stretch factor. An effect of expansion 
clock is clearly seen. 



1. Timing Resolution 



FIG. 13: Timing resolution of the MTS1 + FPGA TDC 
when a non-linearity correction is applied. Non-gaussian 
tails are due to regions with larger jitter due to coupling 
of the reference clock into the ramping circuitry. 

In practice the systematic effects of the upstream 
discriminator and its amplitude dependent thresh- 
old crossing (and comparator overdrive) dependence 
make any further improvements difficult. Never- 
theless it is an interesting question for future ex- 
ploration. This timing resolution is comparable to 
that obtained with the HPTDC after careful non- 
linearity calibration. 

The broader Gaussian distribution and signifi- 
cant non-gaussian tails are correlated with expan- 
sion clock feedthrough to the ramping circuit. This 
could be improved in a future version with better 
layout isolation. The 0.35/im process used only had 
3 metal routing layers available and migration to a 
finer feature size process would allow for dedicated 
shields and better power routing. 



As with the HPTDC device [19] developed at 
CERN for the ALICE detector, a fine calibration 
is needed to obtain a precision comparable to the 
current Belle system. Applying such a calibration, 
determined in a separate data set, significantly im- 
proved linearity and residuals are obtained. The 
subsequent results are histogrammed in Fig. [13l 

As can be seen, the timing resolution fits well to 
a double Gaussian, with a narrow sigma less than 
20ps, which is comparable to (and actually slightly 
better than) the existing Belle system. This result 
is consistent with the expectation from the FPGA 
TDC used, where 
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and the measured sigma is about 15ps. It is possi- 
ble that a finer resolution FPGA TDC would allow 
for an even more precise timing determination. 



2. Multi-hit Buffering 

In order to reduce deadtime a second time- 
stretcher circuit, with a separate output, is provided 
for each input channel. This second circuit becomes 
armed when the primary stretcher circuit is running. 
Use of such a scheme can significantly reduce data 
loss due to arrival of subsequent hit during opera- 
tion of the first stretcher circuit. The factor may be 
expressed as 



^dead = F £ 



N 

single 



(2) 



where N is the number of buffer stages. For N=2, 
the case prototyped here, a large existing deadtime 
of 20% could be reduced to 4%. Moreover, this tech- 
nique can be extended to an even larger number of 
buffer channels, a realistic possibility when using a 
low cost FPGA-based TDC. In the case of 4 outputs, 



B Test Results 
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a 20% single time stretcher deadtime would become 
a completely negligible 1.6 x 10 -3 . 

Apart from the arming circuitry, the second time 
stretcher channel is identical to the primary. Test- 
ing was performed with double-pulse events and the 
result for the second channel is seen in Fig. [Ml 



[ first pulse width vs stretched time 
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FIG. 14: Cross-check measurement of the secondary 
MTS1 output channel, where the results are seen to be 
comparable to the primary channel, apart from a system- 
atically smaller stretch factor as described in the text. 

Note that these secondary channels have a time- 
stretch factor that is systematically smaller. As the 
same reference currents are mirrored in all channels, 
it is believed that this is due to ramp window reduc- 
tion due to latency in the arming logic. 



3. Cross-talk 

An important check of performance of the MTS1 
is the impact of time stretcher operation on one 
channel while another is operating. This has been 
performed in Fig. [15l where the timing of the first 
channel is fixed and the timing relation of the signal 
in channel 2 is varied. 

The impact of operation of this second channel 
is clear during the ramping portion of the readout 
cycle, as well as the threshold crossing at the end 
of the ramping interval. While this effect can be 
calibrated out to some extent, just like effects of the 
clock feedthrough, this perturbation to the circuit 
would be better mitigated through better isolation 
in the IC layout. 



2000 2500 3000 

time between CH1 and CH2 (ns) 



FIG. 15: Timing shift due to adjacent channel crosstalk. 
As expected, impact is most sensitive during the initial 
current ramping and near stretched time threshold cross- 
ing. 



For many applications the HPTDC is perfectly 
suitable and gives comparable time resolution to the 
MTS1 + FPGA TDC. In both cases a non-linearity 
correction is required to obtain this resolution. How- 
ever the time encoding itself is only part of the issue 
for obtaining excellent timing resolution from a de- 
tector output. Correction for time slew in the dis- 
criminator threshold crossing is critical. Moreover 
the addition of many channels of high-speed discrim- 
inator inside a detector is a noise and power concern. 
Compact, high-speed waveform recording [22] may 
be a promising next evolutionary step in the read- 
out of precision timing detectors. 
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V. FUTURE PROSPECTS 



An improved layout paired with future, higher 
clock frequency FPGAs could open the possibility of 
very dense channel count, sub-lOps resolution TDC 
recording. 
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